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Abstract 


The  purpose  of  these  notes  is  to  survey  the  recent  literature  on 
dependence  between  two  random  variables  and  to  stimulate  research  which 
will  extend  (some  of)  these  concepts  and  relations  to  the  case  of  3 or 
more  variables. 
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Part  ]_  Dependence  and  generalizations  of  correlation. 

^ Introduction. 

In  these  notes  we  are  concerned  mainly  with  bivariate  distributions. 
The  connection  between  dependence  and  the  theory  of  multivariate  hazard 
rates  is  also  briefly  discussed.  Extensions  of  positive  dependence  to 
association  and  other  notions  of  dependence  as  well  as  applications  and 
interrelations  are  included  in  Part  2 of  these  notes. 

II)  Positive  dependence.  (Lehmann,  1966). 

TVfo  events  A and  B are  called  dependent  if  P(AnB)  = P(A)P(B) 
is  violated.  We  say  that  there  exists  positive  dependence  if 

P(AnB)  a P(A)P(B). 

IVo  random  variables  are  said  to  be  positively  dependent  if 
P(XcA,  YeB)  ^ P(XeA)P(X€B)  for  any  two  Borel  sets  A and  B on  the 
real  line.  Negative  dependence  is  defined  by  reversing  the  appropriate 
inequalities.  The  former  case  often  occurs  in  reliability  theory  (parts 
of  a machine  usually  have  a longer  life  vdien  they  are  "put  together") . 

The  latter  case  is  prevalent  in  biological  populations  cca^>eting  for 
limited  resources.  (See  Barlow  and  Proschan  (1975).) 

From  Lehmann  (1966)  we  have  the  following  definitions: 

Def.  1.  (X,Y)  or  F^(x,y)  is  positively  quadrant  dependent  if: 


i 
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P(Xsx,  Ysy)  ^ P(Xsx)P(Y5y)  or 

^ Fj^(x)Fy(y)  V x,y. 


(1.1) 


Let  be  the  family  of  distributions  (two-dimensional)  for  which  (1.1) 
is  valid,  be  the  family  of  distributions  for  which  (1.1)  is  valid 
with  reversed  inequality.  Notation  (X,Y)  e Fj(Gj)  means  % c '"l^^XY  ^ 

Lemma  1:  (i)  (X,X)  c Fj 

(ii)  (X,Y)  c Fj  <==>  (X,-Y)  € Gj 
(iii)  P(Xsx,  Ysy)  2 P(Xsx)P(Ysy)  V x,y  <=-> 

P(Xsx,  Y<y)  ^ P(Xsx)P(Y<y)  V x,y  <=> 

P(X<x,  Y<y)  ^ P(X<x)P(Y<y)  V x,y. 

Proof:  (i)  P(X^x,  Xsx)  = P(Xsx)  s P(Xsx)P(Xsx) . 

(ii)  =>  P(X5x,  -Ysy)  * P(Xsx,  Ys-y) 

= P(Xsx)  - P(Xsx,  Y<-y)  - 

* P(Xsx)  - lira  P(xac,  Ys-y-^) 
n^w 

5 P(Xsx)  - Urn  P(Xsx)  • P(Ys-y  - i) 
n-w 

- P(Xsx)  - P(Xsx)  • P(Y<-y) 

- P(Xix)(l  - P(Ys-y))  - P(Xsx)P(-Ysy). 

for  <-  the  proof  is  similar. 

(iii)  Since  P(Xsx,  Y<y)  ■ lim  P(Xsx,  Ysy-i)  and  also 

n-w> 

P(Xsx,  Ysy)  - lim  P(Xsx,  Y<y+i) , we  proceed  as  in  (ii). 
nr*«“  " 


It  is  easy  to  verify  the  following. 
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Lewna  2:  (X,Y)  c Fj 

<=>  P(Xsx,  Yay)  s P()kx)P(Ysy)  V x,y 
<‘ > P(X«,  Ysy)  s P(Xix)P(Ysy)  V x,y 
<“>  P(Xix,  Y^y)  ^ P(Xix)P(Y2y)  V x,y. 

Remarks:  1.  Tlie  signs  s and  s can  be  replaced  by  > and  < re- 

spectively. 

2.  The  last  inequality  is  called  G-dependenoe  by  Lehmann  (1966) 
and  Johnson  and  Kotz  (1975) , and  is  frequently  used  in 
reliability  theory  »diere  X and  Y are  interpreted  as 
life- lengths  of  coi^jonent  parts  of  a machine.  The  validity 
of  the  last  inequality  follows  from  the  simple  relation: 

where  Gj^(x,y)  ■ P(X>x,  Y>y)  and  Gy(x)  = P(X>x)  and 
Gy(y)  - P(Y>y). 

We  conclude  this  section  by  proving  that  (X,Y)  c F^  =>  E(XY)  s EXEY 
provided  the  covariance  and  expectations  exist. 

Lemma  3:  (Hoeffding  1940):  If  F denotes  the  joint  and  and  Fy 

denote  the  marginal  distributions  of  X and  Y,  then 

E(»f)  - EXEY  ■ [E(x.y)  - F^(x)F^(y)]dxdy  - 

■LL  (Gyy(x.y)  - Gjj(x)Gy(y))dxdy, 
provided  the  expectations  on  the  l.h.s.  all  exist. 
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Proof : Define  I(u,x)  =1  if  u<x  and  0 otherwise. 

Let  fXj.Yj),  (X2»Y2)  be  independent,  each  distributed  according 
to  F.  Then 

2 Cov(XjYp  « 2(E(XjYj)  - EX^EYJ  = E(Xj  - X2)  • (Y^  - Y2)  = 

» E [I(u,xp  - I(u,X2)][(v,Yj)  - I(v,Y2)]dudv. 

(Details  of  the  last  step  are  given  in  Appendix  1.)  Using  Fubini's  theorem 
we  can  take  the  expectation  inside  the  integral  sign  thus  obtaining: 

2 Cov(Xj,Yp  » j“  1“  (E  I(u,Xpi(v,Yj)  - E I(u,X2)  * ^^'"'^1^  ' 

E I(u,Xj)  E I(v,Y2)  + E I(u,X2)I(v,Y2))dudv 

{ = 2 r r Cov(l(u,X,),  I(v,Y  ))  ) 

M tce>  ^ J -00  J •><» 

* J J 

+ Gj^  y (u,v)|dudv 

• I I 2[g^  y (u,v)  - Gjj  (u)  • Gy  (v))dudv. 

/ ••QO  ^ “00  ^ X 1 X X 

Using  the  Remark  2 following  Lemma  2 we  coirplete  the  proof.  □ 


Lenina  3 implies: 

Theorem  1:  If  (X,Y)  c Fj  and  EXY,  EX,  EY  exist,  then  EXY  2 EXEY 
or  Cov(X,Y)  a 0.  Moreover,  if  (X,Y)  e Fj  , Cov(X.Y)  - 0 ->  X,Y  are 
independent. 
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Somc  additional  properties  of  positive  quadrant  dependence. 


Given: 


P[Xsx,  Ysy]  i P[Xsx]P[Ysy]  V x,y  e R'  . 


The  following  is  valid: 


(ii): 

(iii): 

(iv): 


P[X>x,  Ysy]  s P[X>x]P[Ysy] 
P[Xsx,  Y>y]  s P[Xsx]P[Y>y] 
P[X>x,  Y>y]  i P[X>x]P[Y>y], 


(v) : (i)  does  not  inply 

P[xj<Xsx2,  yj<Y^2^  ^ P[x^<X^X2]P[yj<Ysy2]  (-«<x^,X2,y^,y2<“>) 


Proofs: 


(ii):  P(X>x,  Ysy)  * P(Y^)  - P(Xsx,  Ysy) 

s P(Ysy)  - P(Xsx)P(Ysy) 

= (1  - P(Xsx))P(Ysy) 

* P(X>x)P(Ysy). 

(iii) : Follows  from  (ii)  by  syninetry. 

(iv):  Prom  (iii) 

P(X>x,  Y>y)  - P(Y>y)  - P(Xsx,  Y>y) 

2 P(Y>y)  - P(Xsx)P(Y>y) 

- (1  - P(Xsx))P(Y>y) 

- P(X>x)P(Y>y) 

Note  that  (iv)  is  just  the  Remark  2 above. 

(v):  Comterexanple : Let  (X,Y)  be  distributed  according  to 


the  following  table. 
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P[X<a,  Y<b]  > P[Xsa]P[Y<b]  for  a=l,2,3 

and  b=],2,3. 

P[X£l,  YSI]  . i 2 ^ X “ 

P[)U2.  ^ J > § (“) 

P[X«,  Y^2]  = § * ^ (J|) 

P[Xsl,  Y<2]  = ^ > J X ^ , and  so  on. 


However  P[l.5^X<2.5,  .S^Ysl.S]  < ^ ^ = p[l.5sX<2.5]  • P[.5<Y<1.5]. 

Indeed,  P[X=a,  Y=b]  cannot  always  be  s P[X=a]P[Y=b]  (excluding  the 

independent  case)  since  both  I I P[X=a,  Y=b]  = 1 and  [ [ P[X=a]P[X=b]  = 1. 

b a b a 


NOTH:  Inequalities  (i)  and  (ii)  imply  that  there  must  be  Borel  sets 

Aj,  Bj  such  that  P[XcAj,  YcBj]  s P[X<Aj]P[YeBj]  and  sets  A2,  B2 
such  that  P[XcA2,  YcB2]  ^ P[XeA2]P[YeB2]  (with  a strict  inequality 
for  at  least  one  pair) . For  example  we  may  choose 


Aj  = [X>x],  A2  = [Xsx] 


and 


Bj  . B2  = [Y^]. 
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1 1 Ij  Measures  of  dependence  from  information- theoretic  aspects . 

The  following  discussion  develops  a measure  of  dependence  based  on 
the  concept  of  entropy  suggested  by  C.  E.  Shannon  (1948)  nearly  three 
decades  ago. 

Let  X be  a discrete  random  variable  with  a finite  number  of  out- 
comes, i.e.  P(X=x^)  = P^  > i = 1,2,...N.  Shannon  defines  the  entropy 
or  the  measure  of  uncertainty  (or  information)  as: 

N N 

h(E)  = - .1^  Pi  log  Pj  » E = (Pl  .P2 • • • • .Pfyj) . 

h(p)  assimes  maximum  log  N iff  P^  ~ ^ ~ 1,...N. 

The  proof  of  this  assertion  as  well  as  some  other  properties  of  h(p) 
are  presented  in  Appendix  2.  Here  we  take  the  particular  case  N=2.  In 
this  case: 


h(p,  1-p)  = -p  log  p - (l-p)log(l-p) ; h(5s,b)  = log  2. 

We  define  h(0,l)  = lim  h(p,  1-p)  = 0,  h(l,0)  = lim  h(p,  1-p)  = 0. 

j>+0  p->-l 

These  limits  exist  by  the  L 'Hospital  rule.  (Observe  that  an  unbiased 
coin  yields  a higher  uncertainty  than  a biased  one.)  Consider  discrete 
r.v.'s  X,Y  with 
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P(X=x^,  Y=y^)  - (i  - 

P(X=x^)  “ , P(Y=yj)  “ • We  have 

^ Pij  . Pi  “ I Pij  and 
by  the  leima  in  Appendix  2, 


- I Piqj  log(Piqj)  ^ - I Pij  log  Pij  - 

This  inequality  suggests  that  we  have  a higher  uncertainty  when  X and 
Y are  independent.  This  fact  is  enployed  to  define  the  logarithmic  index 
of  correlation  r^  : 

^0  “ X.  (Pij  Pij  ■ Pi^lj  Pi'lj)' 

^ »1 

Note  that  rp  i.  0,  and  Tq  * 0 iff  Pij  “ Piqj  V i,j.  If  (X,Y) , X,  Y 
possess  densities  p(x,y),  p(x)  and  qCy)  respectively,  we  define  r^ 
as: 

Tq  - II  [p(x,y)  log  p(x,y)  - p(x)q(y)  log  p(x)q(y)]dxdy. 

We  prove  in  Appendix  3 that  Tq  s 0 also  in  this  case. 

Example:  Consider  the  bivariate  normal  distribution: 

p(x,y)  - ^ e-%(ax'»aixrby')  _ „ 

The  classical  correlation  coefficient  is: 

r . EXY-EXEY  . -h 
/TarT  /^arY  ^ ’ 

The  marginals  are 
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PW  “ r p(x,y)dy  e'“^  ; q(y)  = T p(x,y)dx  = ^ , 

J —CO  / —00 


a - (ab  - h^)/2b 
(See  also  the  next  Section.) 
Computing  Tq  , we  obtain 


3 = (ab  - h'^)/2a. 


‘0 


H log 


/ab-h^ 
ab 

ab-h^ 


Thus; 


-j  = *s  log  — 2" 
1-r 

-2^0 

IV)  Correlation  coefficient  and  correlation  ratio. 

Let  X,  Y be  r.v.'s  with  finite  variances;  define  correlation  coeffi- 
cient R(X,Y)  * , vdiere  D(X) , D(Y)  are  the  standard 

deviations  of  X,  Y,  i.e.  positive  roots  of  Var(X)  and  Var(Y).  Assime 
Var(X),  Var(Y)  > 0,  i.e.  X and  Y are  non-degenerate.  By  the  Cauchy- 

Schwartz  inequality  [E(X  - E(X)) (Y  - E(Y))]^  s E(X-EX)^E(Y-EY)^  , |R(X,Y)|  s 1 

R(X,Y)  « 0 -<►  Cov(X,Y)  • 0. 

In  the  latter  case  we  say  that  X and  Y are  uncorrelated;  this 
implies  that  X and  Y are  independent  if  (X,Y)  is  bivariate  normal; 

R(X,Y)  ■ ±1  if  X and  Y are  directly  proportional.  Itolmogorov  (see 
Renyi,  1959)  defines  ootvelation  ratio  — as  a measure  of  dependence  — 
as  follows: 


4 
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K,(r,  . 

The  relationship  between  R(X,Y)  and  is  given  by 

Theorem  2:  If  X and  Y are  random  variables  and  Var(Y)  exists,  then: 

K^(Y)  = sup  1r(Y,  g(X)) I , where  g runs  over  all  Borel -measurable  real- 

g 

valued  functions  y = g(x)  such  that  the  variance  of  g(X)  exists. 
Moreover,  K^(Y)  = |r(Y,  g(X)) | iff  g(X)  = a E(Y|X)  + b (a.s.),  where 
a*0,  b are  constants. 

(Remark:  Notice  that  the  theorem  implies  that  0 s ^10 

Proof:  Observe  that  R(X,Y)  and  K^(Y)  are  both  invariant  under  linear 

transformations.  We  thus  may  assume  that  E(Y)  = 0,  E(g(X))  = 0 and 

D(Y)  = D(g(X))  = 1.  Now  R(Y,  g(X))  = E(g(X)Y)  and  since 

E(g(X)Y|X)  = g(X)E(Y|X), 

E(g(X)Y)  - E[E(g(X)Y|X)]  = E(g(X)E(Y|X)), 

we  have 

R^(Y,  g(X))  - E^(g(X)E(Y|X))  s (by  the  C.S.  inequality) 
s E(g^(X))E[E^(Y|X)]  = E[E^(Y|X)]. 

Hence  R^(Y,  g(X))  s E(E^(Y|X)). 

Since  0 - E(Y)  - E(E(Y1X)),  we  obtain: 

R^(Y,  g(X))  s E(E^(Y|X))  - E^(E(Y|X))  - Var(E(Y|X)) 


R(Y,  g(X))  s D(E(Y|X)} , 


or 
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sup  |R(Y,  g(X))l  s D(E(Y|X)). 

g 

Now  let  gQ  be  a real  function,  such  that  ggCX)  = E(Y|X).  (Recall 
that  E(Y)  = 0,  D(Y)  = 1 .)  In  this  case 


Thus  D(E(Y|X))  = sup  |R(Y,  g(X)) j , where  g runs  over  all  real  functions 
such  that  Var(g(X))  exists.  Since 


|R(V,  g(X))|  = )E(Yg(X))|  = |E[E(Yg(X)lx)]  = 

= lE(g(X)E(YlX))l  s [Eg^X)E(EVlX))]’^ 

- (EeVIX))**  = D(E(Y|X)), 

the  equality 

K^(Y)  = D(E(Y|X))  = |R(Y,  g(X)) | holds  iff 
g(X)  * aECYjX)  + b (a.s.)  for  some  a*Q  and  b. 


V)  Maximal  correlation. 

We  have  Kj^(Y)  - sup  1r(Y,  g(X))  1 if  K^^CY)  - 0,  then  R(Y,  g(X))  - 0 
for  all  g such  that  E(g(X))^  < ».  This  inplies  that  Y and  g(X)  are 


uncorrelated,  but  does  not  yet 


assure  that  Y and  X are  independent. 


However, 


-11-  cont. 


that  X and  Y ore  independent  if  S(X,Y)  » 0. 


Remark:  S(X,Y)  also  equals  sip  R f(X),  g(Y)  by  the  fact 

- f.g, 

EF*(X)=Eg2(Y)=l 

Ef(X)=Eg(Y)*0 

that  R(*,*)  is  invariant  under  linear  transformations. 
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! 

f 


Theorem  3:  S^)  0 ^ S(X,Y)  s 1. 

83)  S(X,Y)  = S(Y,X). 

Sj)  If  a(x)  and  B(y)  are  strictly  monotonic,  then 
S(X,Y)  = S(a(X),  B(Y)). 

S^)  1R(X,Y)|  £ min(Kj^{Y),  Ky(X))  £ max(K^(Y),  K^CX))  £ S(X,Y). 

Sj)  S(X,Y)  = 0 iff  X and  Y are  independent. 

Sg)  If  there  exists  an  arbitrary  functional  dependence  between 
X and  Y,  i.e.  if  there  exists  Borel  measurable  functions 
fpCX)  and  RqCY)  such  that  fQ(X)  is  not  constant  with 
probability  1 and  fQ(X)  = S(X,Y)  = 1. 

Proof:  Sj):  0 £ S(X,Y)  £ 1 since  -1  £ R|f(X),  g(Y)|  £ 1.  The  non- 

negativity of  sup  R(f(X),  g(Y))  coroes  from  the  fact  that  if 

f.g 

Pf(X)g(Y)  - Ef(X)Eg(Y)  < 0,  we  then  consider  f = -f  to  yield 
R(f'(X),  g(Y))  > 0. 

S2):  S(X,Y)  = S(Y,X)  in  view  of  the  synmetry  of  R(X,Y). 

Sj):  ftote  that  in  general  Ef^(X)  < ® Ef^(a(X))  < ®,  i.e. 

R(a(X),  6(Y))  may  not  exist. 

However  if  a and  B are  strictly  monotonic  «>  a'^  and  6'^ 
exist  and 

R(f(X),  g(Y))  - R(fa'^oX),  gB'VeY)) 

Thus  sup  R(f(X),  g(Y))  £ sup  R(f(oX),  g'(BY)). 

f.g  f'.g' 

Also  sup  R(f'(aX),  g’(6Y))  £ supR(f(X),  g(Y)) . 
f'.g'  f.g 


Thus 


S(X,Y)  - S(oX,  BY). 


I 
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S^)  Since  K^(Y)  = sup  |R(Y,  g(X)j | . Ky(X)  = si^  |R(f(Y),  X) | , 
we  have 

K^(Y)  > |R(Y,X)1,  KyCX)  s 1R(Y,X)1. 

Hence,  |R(Y,X)|  s min(Kj^CY),  Ky(X))  s max(K^(Y),  Ky(X)) . 


S(X,Y)  = sup  R(f(X),  gCY))  ^ sup  R(X,  g(Y)) 
f,g  g 

and  S(X,Y)  > sup  R(f(X),  Y) . 
f 

Thus  1R(X,Y)1  s min(K^(Y),  K^CX))  ^ max(K^(Y),  K^CX))  s S(X,Y). 

Sg)  Define  indicator  functions  on  R^:  = 1 xeA; 

= 0 otherwise 


ggCy)  = 1 ycB 

* 0 otherwise 


where  A and  B are  arbitrary  Borel  sets  on 


the  real  line  such  that  0 < P(XeA)  < 1 and  0 < PCYeB)  < 1. 


► 
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Rcinark:  is  a sufficient  condition  for  S(X,Y)  = 1. 


VI)  Mean-square  contingency. 


Let  X,  Y be  arbitrary  r.v.'s  on  (n,F,P).  The  distribution  of 
(X,Y)  denoted  by  is  defined  on  the  plane  by  P^  " P((X,Y")  t C)  , 

C e 82  ■»  ^2  ^ two-dimensional  Borel  o-algebra.  IVhen 

C = (-®,x]  X (-“>,y]  we  have  the  "usual"  distribution  function: 


F^(x,y)  = P^((-«,x]  X (-00, y]}  = P(X<x,  Ysy). 

If  P^  is  absolutely  continuous  w.r.  to  PX  ^ x py  ^ , the  product 
measure  induced  by  (X,Y),  we  have  according  to  the  Radon-Nikodym  theorem, 


Pxy(C) 


K(x,y)dPX‘W'^ 


(1) 


wliere  K(x,y)  a Borel  measurable  function.  The  R-N  derivative  K(x,y) 
is  expressed  symbolically  as 


dP^/dPX‘W'^ 


Wc  now  define 

$(X,Y) 


1r2 


dP 


XY 


dPX'^dPY*^ 


- 1 


dPX'^dPY'^ 


as  the  mean  square  contingency  of  X,  Y. 


"Historical"  motivation  for  this  definition: 


(2) 


Let  X and  Y be  discrete  r.v.'s  with  P(Aj^)  = P(X=K) , K = 1,...S, 


P(B.)  * P(Y-j),  j = l,...,r.  Then: 
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4>(X,Y) 


J } PTSj)TtB.)  • 


If  we  have  the  rxs  contingency  table: 


1 

2 

3 . • . • s 

1 

^11 

''12 

'^Is 

''1. 

2 

^^21 

^*22 

^^2s 

'’2. 

3 

• 

• 

r 

''rl 

\2 

• . • • • \) 

rs 

V , 

V _ 

V 

n 

•1 

•2 

•s 

1 

V.  . V.  ^ . 

and  we  estijuate  P(AjB^),  P(Aj^),  P(Bj)  by  » ^®^P* 

2 

then  the  estijnated  4>  will  be: 

2 2 

This  is  the  statistic  for  the  x test  of  independence  ( <|)  is  asymptotically 
X distributed  with  (s-1)  x (r-1)  degrees  of  freedom).  Returning  to 
(2),  we  have 

♦ (X,Y)  - 0 ->  , - 1 - 0 a.s. 

dPX'^dPY'^ 

with  respect  to  PX  ^ x PY  ^ . In  other  words  K(x,y)  ■ 1 a.s.  w.r. 
to  PX  ^ X PY“^  . From  (1)  we  have  in  this  case 

Pxy(C)  - I 1 dPX' - PX‘^  X PY‘^(C)  for  all  C c 82  . 

c 

This  iinplies  that  X,  Y are  independent.  We  thus  have 
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Theoran:  4i(X,Y)  = 0 <“>  X,  Y are  independent. 

Definition:  If  X,  Y are  such  that  the  measure  P(X,Y)'^  is  absolutely 

continuous  w.r.  to  PX  ^ x py'^  (i.e.  the  mean  square  contingency  exists), 
it  is  said  that  X,  Y are  regularly  dependent. 

Theorem:  If  X,  Y are  regularly  dependent,  then 

S(X,Y)  5 4>(X,Y). 

Proof:  Since  0 ^ S(X,Y)  s 1 we  can  assume  that  4>(X,Y)  < 1.  Take 

measurable  f and  g such  that 

Ef(X)  » Eg(Y)  - 0 and  Var  f(X)  = Var  g(Y)  = 1. 

Direct  calculations  show  • that 


• Indeed 


R(f(X),  g(Y))  . f 2 f(x)g(y)dP(X,Y)"^  - 

- f 2 f(x)g(y)d(P(X,Y)'^  - PX'^PY*^) 

- f 2 fWg(x) 


dPX'^dPY'^ 


dPX’W'^ 


(Observe  that  both  f V dPX'^dPY'^ 

JC  dPX'^dPY’^ 

and  [ [ j - l]dPX‘^dPY'^  - P((X,Y)€C)  - 


P(X£C)P(YeC),  for  C«B2  ) 

Hence  R(f(X),  g(Y))^  s [ 3 f^(x)g2(y)dPX‘ f , [— t ‘ 

^R^  ivr  [dPX  ^dPY  ^ 


1 Jmv*  1 


dPX  *dPY 


■ 


^ (X,Y)  (by  C.-S.  inequality). 
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llence 

or 


R(f(X),  g(Y))^  ^ 0^(X,Y) 


sup  lR(f(X),  g(Y))l  s d)(X,Y) 

f.g 

S(X,Y)  s ♦CX.Y). 


VII)  Pairwise  loosely  dependent  random  variables. 


Definition:  We  say  that  {X^}  is  a seqiience  of  paiwiee  loosely  depen- 

dent r.s.'s  with  coefficient  C>0  if 


n*l  m=l  1=1 


for  each  sequence  {u  } such  that  ^ u.  < ». 

" i-1  ^ 

Theorem:  {X^}  is  an  independent  family  of  r.v.'s  iff  C*l. 

fl  n=m 

if  (Xjj)  is  independent.  We  thus  can  take 


(0  naqn 


Proof:  S(Xj^,  X^^) 

C»1  in  •. 

Suppose  C*l,  define  u^  ■ lt^|  and  assime  Z < 
We  have: 

Ui  Ji  ^ I,  (■  i,  “n) 

Since  0 s S(X„,  X^)  and  S(Xj^,  X^)  - 1,  and 
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j.  r “a' 

n=l  n=l 

we  havo  ! - 2 S(X„,  yu;;.^  ^ !,  S(x„,  X„)  - O,  nm  {X^) 

n*l  n»i 

is  an  independent  family. 

The  aoeffioient  of  dspendenoe  C has  another  property  as  well. 

Theorem:  Let  {X^^}  be  a sequence  of  loosely  dependent  r.v.'s  with  co- 

efficient of  dependence  C.  If  Y is  an  arbitrary  random  variable  such 


that  EY^  exists.  Then 


00  j 

y 1C  s C. 
n=l 

For  the  proof  of  the  theorem  the  following  lemma  is  required: 

Leitina:  Let  {X^^}  be  a sequence  of  square  integrable  r.v.'s  with  bound 

0>O,  i.e. 

II  I ^ ^ I “n 

2 

for  every  sequence  {u^}  such  that  I ^ Then  we  have  for  any 

2 

random  variable  Y with  EY  < <», 

E^CYX^)  s 0 EY^  . 

Proof:  Consider 

E(Y  - ^ I B(YX^)X^)2  0, 

n*l 

then  e(y2  - I I X Y B(YX„)  . ^ ( 2 X„  E(YX„))2]  a 0 
' n*l  0 n“i 

- BY^  a I B^CYX„)  - ^ B^^  E(YX„)  WP™->) 
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->  EV'  . I i e2(YX„)  - ^ z EOns,)B(Yx„ )E(X„X^) 

n*l  0 n,m 

= I j,  \ i B(%)^  - f J,  E^CYX„). 

n-1  0 n=l  n»l 

(Ranark:  In  Hilbert  spaces  theory,  for  any  orthonormal  system  of  r.v. 's 

00 

{X  },  i.e.  EXj^X^  = 0»  nj^m  (EX^)*^  » 1,  we  have  0=1  and  EY^  s I 

n=l 

(the  so-oalled  Besael  inequality'^.  The  last  lemma  can  be  viewed  as  a 
generalization  of  the  Bessel  inequality.) 

To  prove  the  theorem  observe  that 

D(E(Y|)C  )) 

Kj(^(Y)  E - sup  R(Y,  g(Xj^)) . 

Eg^(V<“ 

2 

Let  {gj^}  be  a sequence  of  measurable  functions  such  that  Eg^(Xj^)  < ®. 

lf(X  )l 

(Such  a sequence  always  be  constructed  by  choosing  gj^(Xj^)  “ l~|f  OC^  I ’ 

g_(X  )-E(g^(X  ))  ^ 

for  any  measurable  f .)  Define  g^(Xj^)  “ Dfg_nL)1 ’ ” ' 1,2., 


Since  {X^^}  is  loosely  dependent  with  coefficient  of  dependence  C we  have 


^IZS(X^.VI%II'V,I  1 >4 

n ro  n=i 

Using  the  Lenna  with  Y*  ■ Y - EY,  we  have: 

I E^'g^CXjj)  s CEY'^  - CD^(Y). 

If  we  choose  in  a special  way  to  be  equal  to 

«„tv  ■ 


i 
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we  obtain 


I E^n 

n=l  L_ 


2p  E(Y'|V-EY'-I  2 

[j'  iJitrYMyrj  ^ ® ’ 


» E"[Y'E(Y'|X„)]  2 

I -r-, cdV). 

n=l  D^(ECY'1V^ 


Note  that: 


E(Y*E(Y'1X^))  = e[e{Y'E(Y'|X„)1\)) 

= E(E(Y'1V^(Y'1V)  “ E(eV’1V^  ' D^(E(Y'|y) 

since  E^E(Y' IX^)  » E^'  - 0. 


Therefore  the  l.h.s.  of  • becomes: 


However, 


D^(E(Y'1\))  ^ CD^CY). 


E(Y'IV  • E(Y  - EY|X^)  - E(Y|X^)  - EY. 


d2(E(Y'1V^  * D^(E(Y|X^)) 


We  finally  obtain  from  M : 


I D^(E(Y|3C  ))  s CD^(Y) 
n-1 


D^(E(Y|V) 


n-1  D^(Y) 


- I 4 

n-1 


(Y)  s C. 


We  know  that  C can  be  chosen  to  be  1 provided  {X^}  is  a sequence 
of  pairwise  independent  r.v. 's. 
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Corollary:  If  (Xj^}  is  a sequence  of  pairwise  independent  r.v.'s,  and 

2 

Y is  an  arbitrary  r.v.  such  that  EY  < then 

00 

I K?  (Y)  s 1. 
n»l  n 


VIII)  Positive  dependence  and  multivariate  hazard  rates. 

Given  an  n-diraensional  random  variable  X = (X^ , . . . ,X^) , denote  by 
m 

FyCx)  = Pr[  n (X.  s X.)]  the  joint  c.d.f.  and  by 
~ j.l  J J 

m 

Gy(x)  = Pr[  n (X.  > X.)]  the  joint  survival  function. 

~ j=l  ^ ^ 

We  assime  that  absolutely  continuous. 

Definition. 

If  hy(x)-  is  an  increasing  (decreasing)  function  of  x.  , for 
i J J 

j = l,2,...,m,  for  all  x c , then  the  distribution  Fjf(x)  is  a (vector)- 

multivariate  increaBing  hazard  rate  [IHR]  (decreasing  hazard  rate  [DHR]  ) 
distribution,  where  h„(x)^  is  the  j-th  component  of  -grad  log  Gj^(x). 

Lemma  1;  If  ^re  mutually  independent  i.e. 

m 

G^(jc)  - n (x.) 
h j-1  ^j  J 

then  hjj(x)j  - hj^  (x^),  vdrere  (Xj)  = - ^ log  Gy.(Xj)  - fj^(Xj)/l-Fj^  (x^) 
is  the  univariate  hazard  of  the  j-th  con^xment. 

Lemma  2:  If  bj^(j^  • £ idiere  g ■ is  an  absolute  constant, 

then  )C  - (Xj^,...X^)  are  mutually  indepeixlent  exponential  random  variables 
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ancl  conversely. 


Proof : Given  hj^(x)  = c,  this  implies  that: 

3 log  G^(x) 


3x. 


(j  * l,...m) 


i.e. 


<=x<s) 


Jj (^1 » • • -Xj -1 , Xj^j,...x^),  V j = 1, 


,m, 


i.e. 


□ 


Gjj(x)aexp(-  I c^x^). 

The  boundary  conditions  on  Gj^Cx)  imply  that 

m 

Gjj(x)  = exp(-  I CjXj),  Xj  s 0. 

Recall  fran  Section  II  that  the  variables  X = (Xj^,X2, . . . ,X^) 
positively  quadrant  dependent  if 

m 

RqCx)  = G^(x)/  ^ 

Observe  that  Rq(Xj^ x^^^)  * Rj,(x2, . . • ,x^) . In  particular  Rj,(Xj^,X2)  = 1 


are  G- 


Xj-^-® 


Xj-*--® 


3 log  IU(x) 

(for  m*2  ).  Moreover  ^ ‘ ^ " l,...m. 


Theorem 


: For  m-2,  if  h^.  (xj  > lw(}^.  (j  - 1,2)  for  all  jc, 

J a 


then 


Rg(x)  i 1 for  all  jc»  i*®*  ^®  variables  X^  and  X2  are  G-positively 
quadrant  dependent 

3 log  IUO5) 

Proof;  Observe  that  > 0 for  all  x and  j - 1,2  together 

with  lira  Rg(x)-1  end  lim  R^(x)  ■ 1,  in5)lies  that  I^(x)  ^ 1 for  all  x.  □ 


X2-*— 


1 
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The  converse  of  this  theorem  is  in  general  not  true. 

Consider  the  survival  function: 

Gj^2(Xi,X2)  = Gj(Xj)G2(x2)[1  + oe  (a>0)  . 

-o<XX2<“’ 

It  is  a survival  function  for  some  choices  of  Gj^(xj)  and  G2(x2)  and  certain 
values  of  a.  Indeed,  first  we  have  to  assure  that 


3Xj^3x2  3Xj^3x2 


[f^Cxp  Xj^Gj(Xj)][f2(x2)  + X2G2(x2)] 
s f^(xpf2(x2)  - a|x^X2|e'’*‘^^>'^^2^  , 


vdiere  -5-^“  fx  , i “ 1,2.  Taking  f^(x.)  = *i|xJe  ^ , we  have 
dX^  1 J J J 


^ (>*-a)  lXj^X2le*’^^^‘'^^2)  >0  if  0 < a < Also:  Gj^2(*l»^2^  ® 

as  Xj  -►  +®;  -*-0  as  X2  -*•  +~;  -►  G^(x^)  as  Xj_j  -►  - ® . For  this 

survival  function  we  have: 

G-|2  -kfx^+X^) 

(i)  ■ 1 + oe  '■1  2-^  > 0,  so  we  have  G-positive  qioadrant 

1 2 

dependence,  but 

(ii)  Rg  is  an  increasing  function  of  for  Xj^  > 0 and  decreasing 

function  of  x^  for  x^  >0,  so  ^ changes  sign  as  Xj 

increases.  However  for  a family  of  bivariate  survival  distributions  of 

the  form: 

SCj,X2^*1'^2^  " ^ 
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(note  that  the  distribution  in  the  counterexample  above  does  not  belong 

to  this  family).  We  have  > ] <->  h^  (xj  - hj^(x) . >0,  j = 1,2. 

j 3 ~ 3 

The  proof  is  presented  in  Appendix  4. 

IX)  Measures  of  dependence  via  "copulas'*. 

a)  Definition  and  properties  of  "copulas". 

The  quadrant  dependence  measures  the  deviation  of  the  bivariate  dis- 
tribution from  its  marginals. 

A more  general  problem  is  to  relate  (explicitly)  a multivariate 
distribution  function  to  its  marginals.  The  FOi  family  (discussed  at 
the  end  of  the  previous  section)  expresses  the  joint  bivariate  distri- 
bution as  an  explicit  function  of  the  marginals: 

FyyCx.y)  - Fj^(x)FY(y)[l  + a(l  - Fj^(x))  (l  - pY(y))],  |a|  s 1. 

Other  examples  of  this  situation  are: 

FJy(x,y)  - max(0,  Fj^(x)  + FY(y)  - l)  (the  lower  Freahet  bound) 

Pj^(x,y)  ■ min|Fj^(x,  FY(y))|  (the  ipper  Frech6t  bound) 

(or  any  linear  combination  with  positive  weights  adding  up  to  1 of 
Fyy(x,y)  and  F^(x,y)  ) and  of  course  the  independent  case 

Fjy(x,y)  - Fjj(x)  • pY(y). 

The  definition  and  the  theorem  below  present  an  answer  to  the  fol- 
lowing two  questions: 
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1)  Can  a given  multivariate  distribution  fimcticwi  be  represented 
as  a function  of  its  marginals? 

2)  What  are  the  characteristics  of  this  function  if  the  answer  to 
1)  is  affirmative? 

Definition:  A copula  C is  a real-valued  function  of  n variables 

(ns2)  defined  on  a subset  of  [0,1]  x [0,1]  ...  x [0,1],  with  the  range 
being  a subset  of  the  interval  [0,1],  satisfying  the  following  properties: 

(Cj)  C(l,...,l,  Xj^,  1,...,1)  - , msn,  x^^^  c [0,1], 

(C2)  C(Xj,X2, . . . ,x^)  » 0 if  ^ “ 0 for  any  m s n, 

(Cj)  C is  non -decreasing  in  each  variable. 

Theorem:  For  n^2,  let  F be  an  n-dimensional  distribution  function 

with  narginals  Pi»p2»"'»'^n  ' there  exists  a copula  C such  that 

F(x^,X2,...x^)  = C(F^(x^) 

for  all  n- tuples  (Xj^,X2, . . . ,x^)  c . 

Proof:  To  show  that  F is  a function  of  F^,F2,...,Fj^  , consider  any 

two  points  jc  - (Xj,X2 x^)  and  y - (yi»---V  ^ * 

We  have 

I t^2 » • • • »x^)  ■ f (Xj  *Y2  » • • • »y|j]  I ^ 

s iFjCxp  - F^CypI  + |F2(X2)  - F2(y2)|  + ...  + |F^(x^)  - Fj^(y^)|.  • 

Ihis  inequality  shows  that  the  set  of  points 


Kl 
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[ ; 


^2 ^^2^ 


F^(Xj^))  , F(Xj ,X2 , . • . 1 1 X e R 


is  a graph  of  a function  C (if  each  Fj  is  continuous  then  C is 
unique).  Inequality  • implies  that  C(Xj,X2,. . . ,x^)  is  a jointly  con- 
tinuous function  of  Xj,X2»...,x^  . Utilizing  the  basic  properties  of 
distribution  functions  it  can  be  easily  verified  that  the  function  C 
satisfies  the  properties  (Cj) , (C2)  and  (Cj).  Q 


b)  Copulas  and  dependence. 

Let  Fj^(u,v)  » C^(F^(u),  Fy(v)).  From  Frech^t's  bounds  we  obtain: 
max(x+y-l,  0)  s C^(x,y)  s min(x,y)  x,y  c [0,l]. 

If  X and  Y are  independent,  we  have 

Cjjy(x.y)  - xy  x,y  € [0,1]. 


These  observations  suggest  that  the  volume  between  the  two  surfaces 
Z » Cjfy(x,y)  and  Z « xy  may  serve  as  a measure  of  dependence  between 
X and  Y.  This  measure  is  foimally  defined  (Schweizer  and  Wolff  (1976)) 
by 

o(X,Y)  - K |xy  - C,^(x,y)|dxdy, 

where  K is  chosen  in  such  a maimer  that  o(X,Y)  s 1 for  all  X and 
Y.  (Observe  that  o(X,Y)  ■ 0 «->  X and  Y are  independent.  Also 
o(X,Y)  - o(Y,X)  .) 

Direct  couq;)utations  yield: 

f f |xy  - 0|dxdy  - 
h JO 
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Ulxy  - min(x,y) jdxdy 
0 


1 

T2  • 


The  normalizing  constant  K therefore  equals  12  and  o(X,Y)  •=  1 
for  the  Frechet's  upper  bomd  min(F^(x),  Fy(y)).  Thus,  finally, 

a(X,Y)  “ 12  |xy  - Cj^(x,y) |dxdy. 

According  to  this  measure  the  maximal  dependence  between  the  variables 
is  idien  Fj^  y(x,y)  ■ min(F^Cx),  Fy(y)),  i.e.  when  the  joint  distribution  is 
given  by  a diagonal -type  surface  over  the  "Fj^(x),  Fy(y) "-plane. 


Part  2 Positive  dependence  revisited. 

Introduction. 

Recall  the  corollary  of  Theorem  1 Section  II  of  Part  1 vhich  states 
that  if  (X,Y)  c Fj  and  EXY,  EX  and  EY  exist,  then  Cov(X,Y)  2 0. 

Esary,  Proschan  and  Walkip  (1967)  define  association  of  X and 
i Y by  requiring  that  Cov(f(X,Y),  g(X,Y))  s 0 for  all  non-decreasing 

real-valued  functicms  f and  g.  They  also  present  a multivariate  ver- 
sion of  this  definition: 

Cov(fQC),  gQC))  i 0 for  non-decreasing  real -valued 
f and  g where  ^ - (Xj^,X2,...,Xj^). 


In  nunerous  reliability  situations,  the  random  variables  of  interest 
are  usually  not  independent,  but  often  satisfy  the  association  property. 
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{See  Barlow  and  Proeahan  (1975).) 

Regression  dependence,  likelihood  ratio  dependence  were  discussed 
by  Lehmann  (1966).  The  TP2  property  of  Karlin  (1968)  is  analogous 
to  Lehmann's  likelihood  ratio  dependence,  which  is  useful  for  determi- 
nation of  the  hazard  rate  behavior  in  univariate  models. 

II)  Association  of  random  variables. 

Definition  1:  Random  variables  (X^^  ,X2 , . . . ,X^)  are  called  associated 

if  Cov(f(X),  g(X))  2:  0 for  all  non -decreasing  functions  f and  g 
such  that  Ef()p,  Eg(X),  and  Ef(X)g(X)  exist. 

Remark:  f,  g are  non-decreasing  if  they  are  non-decreasing  for  each 

variable  when  the  rest  are  held  fixed. 

Property  2:  Any  subset  of  associated  random  variables  forms  a set  of 

associated  variables. 

Property  3i  If  two  sets  of  associated  r.v.'s  are  independent  of  one  another, 
then  their  union  is  a set  of  associated  r.v.'s. 

Proof;  Let  X - (X^, . . . ,i^)  and  X “ be  two  sets  of  asso- 

ciated r.v.'s,  let  X and  X b®  independent  and  f and  g be  non- 
decreasing  fmctions.  We  have: 

Cov(f(!E,X).  g(jC.I))  ■ Ef(X,X)gQt,Y)  - Ef(JC.X)EgQS,Y) 

■ ft«.X)dl.)S-‘<ipX-'  • 
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• g(x,x)dpX- V'^  • 

' 1r"  Dr"  • Ipn  fOlS.X)<ip3<'^  g(jS.Z)<ipJS'^]ilpY-l 

* frf"  Dr"  Ir" 

- Irt.  [rI.  g(x.x)apX'W*  - 

= I + II  - III. 

Now  I2O  since  Xj,...,X^  are  associated;  also,  II  - III  = 

Cov(j^n  f(X»X)dPX’^»  l^n  sC^.D^pX'^  s 0,  because  f(x,x)dPX'^  , 

I n non -decreasing  in  yi»y2»***>ym  » variables 

Y = (Y. Y ) are  associated.  □ 

^ J.  m 

Property  4:  A set  consisting  of  a single  variable  is  associated. 

Proof:  It  is  required  to  show  that  Cov(f(X),  g(X))  ^ 0 V non-decreasing 

f and  g. 

Recall  (Section  II  in  Part  1)  that  Hoeffding's  lemma  yields 


Cov(X,Y)  - 

r 

J *00 

r (G^cu.v)  - 
* -00 

Gjj(u)Gy(v))dudi 

- 

r 

j .00 

I Cov(l(u,X), 

J -00 

I(v,Y))dudv, 

I(u,x)  - 1 

if 

USX  and 

0 otherwise 

I(v,x)  - 1 

if 

vsx  and 

0 otherwise , 

Gjjy(U,V) 

- P(X>u,  Y>v) , Gjj(u)  - P(X>u) , 

Gy(v)  - P(Y>v). 

Define  now 
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l(u,  f(x))  = 1 if  f(x)  2 u and  0 otherwise 

l(u,  g(x))  =1  if  g(x)  2 V and  0 otherwise. 

Then  as  in  the  proof  of  Hoeffding's  lesrma  (Lenma  3,  Part  1); 


Cov(f(X),  g(X))  - f f Cov[l(u,  f(X)),  I(v,  g(X))| 
/ .00  * -»  ^ ' 


dudv. 


Observe  that  l(u,  f(x))  and  l(v,  g(x))  are  two-Aralued  non-decreasing 
functions  of  x.  There  are  two  possibilities,  either  l(u,  f(x))  s;  l(v,  g(x)) 
for  all  X,  or  l(u,  f(x))  < l(v,  g(x))  for  all  x. 

In  the  first  case: 


Cov(f(X),  g(X))  - f_  f(X))l(v,  g(X))]  - e[i(u,  f(X))]  x 
X e[i(v,  g(X))]|dudv 

= f(X))]E[l(v,  g(X))]dudv 

* r r f(X))]]dudv  2 0. 


In  the  second  case: 


Cov(f(X),  g(X))  ~ f_  f_  e[i(u,  f(X))]  1 - e[i(v,  g(X))]dudv  s 0 
and  Prop.  4 is  verified. 


□ 


Property  5:  Non- decreasing  functions  of  associated  random  variables 

are  associated. 

Proof:  Let  Xj,X2,...,X^  be  associated  random  variables;  h^  (i  ■ l,...,m) 

be  non-decreasing  functions  of  n variables.  To  show  that 
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i « 1 are  associated  it  is  sufficient  to  show  that  Cov(f(Y),  g(Y))  s 0 

for  all  ncm -decreasing  f and  g of  m variables. 

However,  Cov(f(Y),  g(Y))  - cov|f(h(X)),  g(h(X))]  and  the  latter  is 
non-negative  since  are  associated.  0 

Property  6:  Independent  randam  variables  are  associated. 

Proof;  Let  Xj,X2, . . . ,Xj^  be  independent  randan  variables. 

By  Prop.  4 Xj^  is  associated  and  X2  is  associated. 

By  Prop.  3 (Xj,X2)  are  associated,  and  the  result  follows  by  in- 
duction. □ 

111)  Positive  regression  dependence. 

Recall  that  (X,Y)  c Fj  , i.e.  X and  Y have  positive  quadrant 
dependence  (notation  PQD(X,Y)  ),  if  P(Xsx,  Ysy)  ^ P(Xsx)P(Ysy)  for 
all  x,y  c Rj  . If  P(Xsx)  > 0,  this  condition  can  be  restated  as 
P(Ysy|Xsx)  i P(Ysy),  V x,y.  This  observation  motivates  the  following 
two  notions: 

Definition  2:  If  P(Ysy|Xsac)  4 as  x f for  all  y,  we  say  that  Y 

is  left  tail  deareaeing  in  X (notation  LTD(Y|X)  ), 

Definition  3:  If  P(Ysy|X»x)  1-  as  x t for  all  y,  we  say  that  Y 

is  positively  regreeeion  dependent  on  X.  (Notation  PRD(Y]X)  or  (X,Y)  c F2  .) 

Exanyle  1 (Lehmann  1966):  Let  Y ■ a ♦ BX  ♦ U,  where  X and  U are 
independent  r.v. 's.  In  this  case: 


A 
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P(Y5y|X.x)  . E(I[^^yj|X.x)  . E(I[^,gx.„^y]|X-x) 

■ ^''[a»6x*Uxy]l>'’’‘)  ’ «[a.ex*Uxy]  ’ P(»  * Sx  * U ^ X)- 

Thus,  P(Ysy|X-x)  + as  x + if  g > 0 [PRD(Y|X)] 

P(Ysy|X«x)  + as  x f if  6 < 0 [NRD(Y|X)] 

and  P(Ysy|X“x)  * P(a  + U s y)  if  B*0. 

Theorem  1;  PRD(Y|X)  =’>  LTD(Y|X)  ”>  PQD(X,Y) . 

Proof:  From  Def.  2,  P(Ysy|XsxJ  s P(YsylXsx'),  x < x'. 

Let  x'  + «>,  then 

P(YsylXsx)  s lim  ^ « P(Ysy). 

Thus  LTD(Y|X)  “>  PQD(X,Y). 

Now,  if  LTDCY|X)  holds  then 

P(YsylXsx)  ^ PCYsylXsx')  for  all  x<x' 

and  all  y. 

/*^P(YsylX-u)dP(Xsu)  /^^P(Ysy|X-u)dP(Xsu) 

POScJ ^ PUSFl ' 

with  X < X*.  If  P(Ysy|X*u)  is  a decreasing  function  of  u (i.e. 
pRD(Y|X)  holds)  then  the  last  inequality  is  valid.  Thus  PRD(Y|X)  •*>> 
LTDCY|X).  □ 


-33- 


IV)  Relationships  between  some  notions  of  bivariate  positive  (negativf 
dependence. 


In  addition  to  the  three  definitions  of  bivariate  dependence  intro- 
duced in  Section  3)  we  define: 


Definition  4:  Y is  right  tail  increaeing  in  X (RTICYIX))  if 

P(Y>y|X>x)  + as  X + for  all  y. 

Definition  5:  Y is  etoohastiaally  inareasing  in  X (SI(Y|X))  if 

P(Y>y|X*x)  f as  X + for  all  y. 

Definition  6:  If  X,  Y have  joint  density  f(x,y),  we  say  that  f(x,y) 

is  TP2  or  TP2(X,Y)  if 

f(Xj,yj)  f(x^,y2)  ^ for  all  x^  < X2  and  y^  < y2 

f(x2»yj)  f(x2fy2)  ill  the  domain  of  X and  Y. 

In  addition  to  the  relations  — due  to  Lehmann  (1966)  — 

PRD  ->  LTD  ->  PQD 

which  was  proved  in  the  previous  section,  the  following  has  been  shown 
by  Esary  and  Proschan. 

Theorem  2 (Esary  and  Proschan  1972): 

TP2(X,Y)  ->  SI(Y|X)  ”>  FTI(Y|X)  PQD(X,Y). 


Proof:  If  TP2(X,Y)  is  valid,  then  by  definition 
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fCxrXi)  x^<X2 


lf(x2,yi)  f(x2,y2)| 


yi  " ^2  * 


or  f(Xj,yj)f(x2,y2)  ^ f(xj,y2)f(x2,yi). 

Integrating  over  the  variables  and  we  have: 

f f(x,,yjdy^  r f(X2,y2)dy2  ^ f f(x2.yi)dyi  J 
* -00  ^ y ^ ^ y 

or  I*  £(x^,z)dz  I”  f(x2,z)dz 

^ ^ s 0, 

ry  ry 

f(x,,z)dz  £(x2,z)dz 

j .00  * -00 

Adding  the  1-st  row  o£  the  last  determinant  to  the  second  we  obtain 
j|~  £(Xj,z)dz  |*  £(x2,z)dz 


£l(xi) 


£z(x2) 


y^£(x^,z)dz  /“£(x2,z)dz 


or  P(Y>y|X-xp  s P(Y>ylX-X2),  < Xz  i.e.  SI(Y|X)  is  valid.  Thus 
TPzCX.Y)  ->  SI(Y|X).  Now  let  X and  Y satis£y  SI(Y|X). 

By  de£inition: 


P(Y>ylX-x^)  s P(Y>y|X«X2)  for  Xj  < Xz 


and  all  y. 


Equivalently, 


;^£(Xj,z)dz  /^£(Xz,z)dz 


J 
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flCxz)  r f(Xj,z)dz  s f,(Xj)  r f(x2,z)dz  ^ 0. 

iy  }y 

Integrating  • from  to  S2  (with  < S2  ) over  Xj  , from  S2 


to  “>  over  X2  , we  have: 


f(Xj,z)dzdXj  fj(x2)dx2  ^ j”  f(x2,z)dzdx2  fj^Cx^dx^ 

(the  inequality  • remains  valid  as  long  as  the  ranges  of  integration  satisfy 
’<1  = ’‘2  > s 

or  /g^/”f(Xj,z)dzdXj^  /“  /“f (x2,z)dzdx2 


■'^82^1  ^^2  ^‘^2 


Adding  the  second  column  to  the  first  we  have 


(Xi  ,z)dzdXi  (X2  ,z)dzdx2 


■^82^1  ^^2^ 


This  ijnplies  that 

/g  /yf(xi,z)dzdxj  /g  /”f(x2,z)dzdxj 

— s — , 

i.e.  P(Y>y|X>Sj)  s P(Y>y|X>S2),  8j^  < 82  which  means  that  RTI(Y|X) 

is  valid.  To  complete  the  proof  of  Theorem  2 we  must  establish  that 

RTI(Y|X)  ^ PQD(X,Y). 


It  is  easy  to  show  that  A(X,Y)  (aeaooiation  between  X and  Y ) implies 
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PQD(X.Y). 

Indeed: 

A(X,Y)  =>  Cov(f(X),  g(Y))  ^ 0 

for  all  non -decreasing  f and  g.  Now  let 

f(u)  = 1 if  u > X and  0 otherwise 

g(v)  = 1 if  V > y and  0 otherwise. 

Since  these  particular  f and  g are  non-decreasing  we  have 

0 < Cov(f(X),  g(Y))  = E(f(X)g(Y))  - Ef(X)Eg(Y) 

= P(X>x,  Y>y)  - P(X>x)P(Y>y)  = P(X^x,  Y^y)  - 
P(Xsx)P(Ysy). 

In  other  words  PQD(X,Y)  is  implied. 


The  missing  part  involves  verification  of  the  implication 
RTI(Y|X)  =>  A(X,Y).  The  proof  of  this  proposition  is  quite  long  and 
constitutes  the  major  proof  of  Esary  and  Proschan's  1972  paper  in  the 
Ann.  of  Math.  Statist. 

Final  remark.  If  the  variables  X and  Y take  on  values  0 and  1 
only,  all  the  above  conditions  of  dependence  are  equivalent: 

Indeed  in  this  case  PQD(X,Y)  “>  TP2(X,Y). 


COTisider 


P(X=0,  Y- 
D = 

P(X=1, 

1 

P(X=1) 

= P(x=i,  Y=: 
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0),  P(X=0,  Y=l) 

0),  P(X-1,  Y=l) 

P(Y=1) 
PfX»l,  Y=l) 


(adding  the  bottom 
row  to  the  top  one 
and  the  second 
column  to  the  first) 


- P(X=1)P(Y=1). 


PQD(X,Y)  =>  P(X=1,  Y=l)  - P(X=1)P(Y=1)  ^ 0 =>  D s 0 =>  TP2(X,Y). 
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Appendlces 

Appendix  1. 

Details  of  the  proof  of  Hoeffding's  lenina  (Lama  3,  Part  1) . 
Define: 

I(u,x)  =1  if  u < X 

= 0 if  USX. 


Auxiliary  lama. 

1.  If  X is  a randan  variable  then  E I(u,X)  * P(Xsu). 

Pf;  E I(u,X)  = 1 • P(l(u,X)  = 1)  ♦ 0 • P(l(u,X)  = O) 
« 1 • P(X>uJ  + 0 = P(X>u) . 


Auxiliary  lama. 

2.  Let  Xj^,  X2,  Y2  be  randan  variables  defined  on  the  same 
probability  space  (fi , F , P) . Then , 

(Xj  - X2)(Yj  - Y2)  = r r (Ku,Xj)  - I(u,X2))(I(v,Yj)  - I(v,Y2))dudv. 

/ *00  • — 00 

Proof: 

Both  sides  of  the  equality  are  random  variables,  and  the  double 
integral  is  interpreted  as  a two  dimensional  Lebesgue  integral.  The 
integrability  of  the  integrand  on  the  right  hand  double  integral  is  jus- 
tified by  Fubini's  theorem.  This  theorem  assures  that  if 


j*  I(u,Xj)  - I(u,X2)du  and  (l(v,Yj^)  - I(v,Y2))dv  exist  so  does 


-39- 


[l(u,Xj)  - I(v,X2)][l(v,Yp  - I(v,Y2)]dudv. 

Now  |l(u,  - l(u,  X2'^^)jdu  is  proved  to  be  equal  to 

Xj((o)  - X2(oj)  for  each  toed. 

Consider  the  case  Xj^(u)  < X2(u»). 


u«Xj (w) 


u<X,  Cu)) 


u*X2  (to) 

Xj^(a»)<u<X2(ai)  X2(to)<u 


u 


X^(u)) 
In  this  case: 


X,(a>) 


l(u,  X^(U)))  - I(u,  X2(0)))  -1-1  = 0 

if 

u < Xj((o) 

I(u,  X^(w))  - I(u,  X2(o)))  - 1 - 1 - 0 

if 

u - Xj(u>) 

I(u,  Xj(o)))  - I(u,  X2M]  - 0 - 1 - -1 

if 

Xj(o))  < u < X2(uj) 

I(u,  Xj(u))  - I(u,  X2(u)))  - 0 - 1 - -1 

if 

3 

N 

P 

I(u,  X^(w))  - I(u,  X2(uj))  - 0 - 0 - 0 

if 

e 

A 

c 

We  thus  have  only  to  consider  the  integral 

|l(u,  Xj((d))  - l(u,  X2(u))jdu.  This  integral  in  interpreted  as: 

I [l(u,  X, (u))  - l(u,  X.(o)))]du  with  e,,e,  > 0.  (We  may 

^l"^  ^X^(o>)+ei  ^ ^ ^ ^ 

C2-K) 

handle  this  integral  as  a Riemann  integral,  since  Riemann  sense  and  Lebesgue 
sense  coincide  In  this  case  for  the  function  l(u,  Xj(u))  - l(u,  X2(«))  on 
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the  set  X2(o))]  .)  The  last  integral  equals 

lim  (-1)  X2(t»>)  ■ ^2  * ■*'  “ (*l)(X2(‘*j)  " Xj(u)))  = Xj^(o))  - X2(w). 

e,>0 

In  a similar  manner 


Yj(u)))  - I(v,  Y2(o)) 


)] 


dv 


Yj(u))  - 


Therefore : 


r (Ku.Xj) 

j .00 

r (Kv.vj) 

i mCO 


I(u,X2))  ~ \ - ^2 

I(v,Y2))  = ■ ^2  • 


So, 


) - I(u,X2)) 


(l(v,Yp  - I(v.Y2))dudv 


(by  Fi4)ini's  theorem)  as 

- r I(u,Xj)  - I(u,X2)du  r (l(v,Y^) 

J -00  ^ -oo 

. (Xj  - )4)CYi  - 'l2>- 


Uv.Yjjjdv 


-41- 


Appendix  2. 

“nieorein  (Jensen  (1971)) 

Let  ^ ® two-dimensional  distribution  function  vdiich  admits 

the  series  expansion 

00 

■ [1  ♦ aj.43f(x)4^(y)]dpj(x)pj(y) 

in  the  functions  {<^j.(*)}  and  Pj(*).  TTien  a sufficient  condition  that 
the  equality 

[ du2(x,y)  s f dy,(x)  f du,(y) 

holds  for  every  measurable  set  A is  that  the  sequence  {a^}  be  non- 
negative. 


Remark;  Note  that  both  marginals  of  U2(**)  are  Uj^(y).  If  all  a^  = 0, 
r » 1,...,  we  obtain  the  independent  case. 


Proof:  [ du-(x,y)  - f dp-(x)  f dy-(y)  ♦ 

^AxA  ^ U ^ Ja  ^ 

♦j.(x)duj(x)  ♦j.(y)dyj(y) 

Example;  In  the  bivariate  normal  the  well  known  expansion  is: 


dW2(*»y)  • Cl  ♦ I *rM<Pr(y^]dUl(x)duj(y) 
r*l 


f 
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<4icre  <|)j.(x)  are  the  Hermite  polynomials  and  p is  the  correlation 
coefficient. 


Appendix  3 . 

Theorem  1 (Shannon  (1948)) 
The  function  h(£)  * - 
iff  p.  - ^ i - 1,2,. ..N. 

hemma  1 . het  ^2,  ^^2  ^ * * * 

N N 


N 

1 p.  log  p.  s log  N with  the  equality 
i-1  ^ ^ 


*^1 ’^2  ’ ‘ ^ arbitrary  positive  nunbers 


with  I Pi  “ ^ I 9i  “ 1*  Then, 

I i“l  ^ i*l  J 

} N N 

- I Pi  log  Pi  s ~ [ p.  log  q. 
i-1  ^ ^ i-1  ^ ^ 

with  the  equality  iff  Pi  ■ li  , i ■ 1,...,N. 

I 1 1 


Proof:  Consider  the  function  y ■ log  x. 

Elementary  observations  show  that 


log  X s X - 1 for  all  x > 0. 


Hence 
Therefore, 
Sunning  up. 


log  q^/Pj  i q^/Pi  - 1,  i - 1,2,...N. 


Pi  log  qi  - Pi  log  p^  s q^  - p^  . 

N N 

I Pi  log  qi  - I Pi  log  p.  s 0 
i-1  ^ ^ i-1  ^ ^ 

N N 

- I Pi  log  Pi  s - I Pi  log  qi  . 
i-1  ^ ^ i-1  ^ ^ 


or 
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T-i 

The  equality  will  hold  iff  log  = p — 1 for  all  i.  i*e*  iff 

qi  * Pi  Vi*  1,2,...N. 

Applying  this  lenina  with  q^  * ^ , we  obtain 

N N 


II  11  4 

- I Pi  log  p.  i - I Pi  log  i * N 
i=l  1 ^ i-1  ^ ” 


1 


with  equality  iff  Pj  ■ > i ” 1,2,...,N 


Theorem  2 Let  X and  Y be  discrete  r.v.'s  with  joint  entropy 
n.m  n 


fA 

- I p. . log  p. . and  marginal  entropies  - I p^  log  p • , 

ij  ij  i-i  1 1 


i-1 


m 

- I qi  log  q.  , where  P(X«x. , Y*y.)  * p. . ; P(X*x.)  - p.  and 

11  1 j ij  11 

i * 1^2y***  f n 


P(V'XJ  - q.  , 

j * l|2fa«*  >m  • 


Then: 


n.m 


n.m 


■ l.m  lEjlil 

1 Plj  Pij  ^ ■ ,i.  Pi<1j  1“8  Pilj 


Proof: 


n n m 

Let  h(p)  - - I Pi  log  p.  - - X I p. . log  p. 

*”  i-1  ^ ^ i-1  j-1  ^ 


m 


m n 


h(q)  - - X q,  log  qi  - - Z Z Pii  log  qi  . 
j-1  J J j-1  i-1  J 

n m n m 

h(p)  ♦ h(q)  - - Z Z Pii  log  Piqi  i - Z Z Pii  log  Pii 

^ ~ i-1  j-1  ^ 1 i-1  j-1  ^1 


n.m 


n.m 


However 


. - Piqj  log  Pjqj  ■ ■ .ij  PiqjClog  Pj  ♦ log  qj) 

- h(p)  ♦ h(q). 


1 


□ 
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(Equality  is  attained 


i = 1 , . . .n 


in  • iff  p..  = p.q.  , 

^ J j = 


.) 


Appendix  M . 

Continuous  and  multivariate  extensions  of  the  information -theoretic  measure 
of  dependence. 

Let  (X,Y)  be  a random  vector  with  joint  density  f(x,y),  marginals 
f(x)  » I f(x,y)dy  and  g(y)  = | f(x,y)dx.  Assune  that  f(x,y)  > 0 a.e. 
with  respect  to  two-dimensional  Lebesgue  measure,  f(x)  > 0,  g(y)  > 0 
a.e.  with  resp>ect  to  Lebesgue  measure  and 


E log  f(X,Y),  Elogf(X),  E log  g(Y) 


exist. 


Lcnma  1.  - j g(y)log  g(y)dy  s - j g(y)log  f(y)dy 

or  - E log  g(Y)  s - E log  f(Y) 

with  equality  iff  g(x)  ■ f(x)  a.e.  with  respect  to  Lebesgue  measure. 
Proof;  From  the  basic  inequality 


we  have 


or 


or 


hence: 


log  X s X - 1 for  all  x > 0 

log  f(y)  - log  g(y)  s 11^  - 1 
g(y)log  f(y)  - g(y)log  g(y)  s f(y)  - g(y), 

g(y)log  g(y)dy  S - | g(y)log  f(y)dy 
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(since  j f(y)dy  * j g(y)dy  » 1 ).  The  equality  part  follows  from  the  fact 
that  log  X = X - 1 only  iff  x = 1. 

Lcnma  2.  (Bivariate  version  of  Lenina  1) : 

- j j f(x,y)log  f(x,y)dxdy  ^ - j j f(x,y)log  f (x)g(y)dxdy 

2 

with  equality  iff  f(x,y)  * f(x)g(y)  a.e.  w.r.  to  Lebesgue  measure  on  R . 
Proof:  Using  Lemna  1: 


or  f{x,y)log  f(x)g(y)  - f(x,y)log  f(x,y)  s f(x)g(y)  - f(x,y) 

or  jj  f(x,y)log  f(x)g(y)dxdy  - jj  f(x,y)log  f(x,y)dxdy  s 0 

or  - II  f(x,y)log  f(x,y)dxdy  ^ f(x,y)log  f(x)g(y)dxdy.  □ 

(The  equality  part  follows  from  the  basic  inequality  in  Lemna  1.) 

Remark.  Leirnia  2 is  a particular  case  of  a more  general  result. 

In  fact  we  can  prove  that: 

- II  f(x,y)log  f(x,y)dxdy  s - j|  f(x,y)log  h(x,y)dxdy 

\Atere  h(x,y)  is  another  biA^riate  density  with  the  same  siqjport  as  f(x,y). 
This  can  be  derived  by  considering  the  inequality 


We  are  now  ready  to  prove. 

Theorem  1.  Under  the  assunptions  above, 

- II  f(x,y)log  f(x,y)dxdy  s - jj  f(x)g(y)log  f(x)g(y)dxdy, 

with  equality  iff  f(x,y)  ■ f(x)g(y)  a.e.  w.r.  to  Lebesgue  measure  on 
2 

R , or  X,  Y independent. 
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RHS  = - II  f(x)g(y)[log  f(x)  ♦ log  g(y)]dxdy 

= - I f(x)log  f(x)dx  - I g(y)log  g(y)dy 

= - II  f(x,y)log  f(x)dxdy  - ||  f(x,y)log  g(y)dydx 

= - II  f(x,y)log  f(x)g(y)dxdy. 

"Hie  interchange  of  integral  signs  is  justified  by  Fubini's  theorem 
(in  the  second  double  integral)  since 

II  |f(x,y)log  g(y)|dxdy  “ ||  | f (x,y) Idxj log  g(y)|dy 

» I f(y)|log  g(y)|dy  = Ellog  g(X) | < » 

by  assunption. 

Now  applying  Lenina  2: 

- II  f(x,y)log  f(x,y)dxdy  ^ - ||  f(x,y)log  f(x)g(y)dxdy.  □ 


Corollary  1.  In  a denunierable  discrete  case,  we  have 

- I Pii  log  Pii  ^ - .1.  Pi<li  log 
i.j  •*  •'  ^ 

00  00 

where  p.  - I p.  • , q.  - I p..  . 

1 j-1  J i-1 

Remarks;  A derainerable  discrete  version  of  Lenina  2 is: 
- I Py  log  Py  i - I Py  log  PjOj  . 

00  fl» 

where  and  p^^  . 
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I 


A special  case  of  the  Remark  to  Lemma  2 is: 


Z Pij  108  Py  ^ ■ I Py  log  hjj  . 


vrfiere  is  another  bivariate  discrete  distribution  with  some  proba- 

bility support  as  p^j  . 


Multivariate  extensions  of  the  above  cases  can  be  suimarized  by  the 
following  two  theorems.  The  first  is  an  extension  of  Lenina  2,  while  the 
second  is  that  of  Theorem  1. 


Theorem  2. 

Let  (X,Y)  be  on  m+n  dimensional  vector. 

Then  f. f(x,;^)log  f(x,y)dxdy  s - [. ..[  f(x,jr)log  f (x)g(;^)dxcljr 
'm+n-'  ~ -'m+n-' 

or  - E log  f(X,Y)  s - E log  f(X)g(Y). 

(With  equality  iff  X,  Y are  independent.) 

Theorem  3. 


L.j  f(x,jr)log  f(x.jr)dxdjr  s - L.j  f(x)g(jr)log  f (x)g(jr)dxdjr 
^m+n'  'm+n'' 


(With  equality  iff  X,  Y are  independent.) 


Appendix  S 

Theorem.  For  the  bivariate  P04  (Farlie-Gunbel-Morgenstem)  family  of 
bivariate  distributions  given  by 


i 


N<  A 
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or  equivalently 


the  G-orthant  positive  dependence  is  equivalent  to  the  property 


h^,(X.)  > hj((x)j  . 

Proof:  For  this  family  R^(Xj,X2)  ^ 1 iff  a s 0. 
Moreover  for  this  family: 


\,X. 


h^  (x.)  - a{l  - 

J ^3-j 

[1  - {e(C^  (Xj))‘^  - 


where  $ 


1 ♦ [a  Fy  ^ » 

^3-j  ^ J 


j 


Since  8 has  the  same  sign  as  a,  h^  (x^)  - hj^  (x)j  has  the  same 

j 1’  2 

sign  a;  thus  h^  (x.)  > <■->  a > 0 <->  R^(Xj^,X2)  ^ 1.  I 

J ^ 
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